D3S: Debugging Deployed Distributed Systems

نویسندگان

  • Xuezheng Liu
  • Zhenyu Guo
  • Xi Wang
  • Feibo Chen
  • Xiaochen Lian
  • Jian Tang
  • Ming Wu
  • M. Frans Kaashoek
  • Zheng Zhang
چکیده

Testing large-scale distributed systems is a challenge, because some errors manifest themselves only after a distributed sequence of events that involves machine and network failures. DS is a checker that allows developers to specify predicates on distributed properties of a deployed system, and that checks these predicates while the system is running. When DS finds a problem it produces the sequence of state changes that led to the problem, allowing developers to quickly find the root cause. Developers write predicates in a simple and sequential programming style, while DS checks these predicates in a distributed and parallel manner to allow checking to be scalable to large systems and fault tolerant. By using binary instrumentation, DS works transparently with legacy systems and can change predicates to be checked at runtime. An evaluation with 5 deployed systems shows that DS can detect non-trivial correctness and performance bugs at runtime and with low performance overhead (less than 8%).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Live debugging of distributed systems

Debugging distributed systems is challenging. Although incremental debugging during development finds some bugs, developers are rarely able to fully test their systems under realistic operating conditions prior to deployment. While deploying a system exposes it to realistic conditions, debugging requires the developer to: (i) detect a bug, (ii) gather the system state necessary for diagnosis, a...

متن کامل

Comparison , Replay , and Refinement of Communication Traces for Debugging Distributed Failures

An increasing number of companies build their business on distributed Web applications. Hosting providers respond to that demand and made it easier to deploy systems that spread across multiple services. However, this trend has outpaced the development of adequate debugging tools and developers still have to rely on an improvised patchwork of symbolic debuggers and printf debugging to find fail...

متن کامل

Towards Lightweight Logging and Replay of Embedded, Distributed Systems⋆ (Invited Paper)

Due to their safety critical nature, Cyber-Physical Systems such as collaborative cars or smart grids demand for thorough testing and evaluation. However, debugging such systems during deployment is challenging, due to the concurrent nature of distributed systems and the limited insight that any deployed system offers. In this paper we introduce MILD; providing Minimal Intrusive Logging and Det...

متن کامل

Towards Lightweight Logging and Replay of Embedded, Distributed Systems

Due to their safety critical nature, Cyber-Physical Systems such as collaborative cars or smart grids demand for thorough testing and evaluation. However, debugging such systems during deployment is challenging, due to the concurrent nature of distributed systems and the limited insight that any deployed system offers. In this paper we introduce MILD; providing Minimal Intrusive Logging and Det...

متن کامل

Peeking into Spammer Behavior from a Unique Vantage Point

cO N fe re N ce re p O rt s 105 from the logs. This can entail considerable developer effort, and getting just the right level of logging can require many iterations: Too much logging can produce unacceptable overhead, but too little will miss key state changes. And even after the logs are captured, analysis remains challenging. D3S attempts to simplify the process of runtime assertion checking...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008